65 research outputs found
PARIS: Part-level Reconstruction and Motion Analysis for Articulated Objects
We address the task of simultaneous part-level reconstruction and motion
parameter estimation for articulated objects. Given two sets of multi-view
images of an object in two static articulation states, we decouple the movable
part from the static part and reconstruct shape and appearance while predicting
the motion parameters. To tackle this problem, we present PARIS: a
self-supervised, end-to-end architecture that learns part-level implicit shape
and appearance models and optimizes motion parameters jointly without any 3D
supervision, motion, or semantic annotation. Our experiments show that our
method generalizes better across object categories, and outperforms baselines
and prior work that are given 3D point clouds as input. Our approach improves
reconstruction relative to state-of-the-art baselines with a Chamfer-L1
distance reduction of 3.94 (45.2%) for objects and 26.79 (84.5%) for parts, and
achieves 5% error rate for motion estimation across 10 object categories.
Video summary at: https://youtu.be/tDSrROPCgUcComment: Presented at ICCV 2023. Project website:
https://3dlg-hcvc.github.io/paris
DAHiTrA: Damage Assessment Using a Novel Hierarchical Transformer Architecture
This paper presents DAHiTrA, a novel deep-learning model with hierarchical
transformers to classify building damages based on satellite images in the
aftermath of hurricanes. An automated building damage assessment provides
critical information for decision making and resource allocation for rapid
emergency response. Satellite imagery provides real-time, high-coverage
information and offers opportunities to inform large-scale post-disaster
building damage assessment. In addition, deep-learning methods have shown to be
promising in classifying building damage. In this work, a novel
transformer-based network is proposed for assessing building damage. This
network leverages hierarchical spatial features of multiple resolutions and
captures temporal difference in the feature domain after applying a transformer
encoder on the spatial features. The proposed network achieves
state-of-the-art-performance when tested on a large-scale disaster damage
dataset (xBD) for building localization and damage classification, as well as
on LEVIR-CD dataset for change detection tasks. In addition, we introduce a new
high-resolution satellite imagery dataset, Ida-BD (related to the 2021
Hurricane Ida in Louisiana in 2021, for domain adaptation to further evaluate
the capability of the model to be applied to newly damaged areas with scarce
data. The domain adaptation results indicate that the proposed model can be
adapted to a new event with only limited fine-tuning. Hence, the proposed model
advances the current state of the art through better performance and domain
adaptation. Also, Ida-BD provides a higher-resolution annotated dataset for
future studies in this field
DCSG: Unsupervised Learning of Compact CSG Trees with Dual Complements and Dropouts
We present DCSG, a neural model composed of two dual and complementary
network branches, with dropouts, for unsupervised learning of compact
constructive solid geometry (CSG) representations of 3D CAD shapes. Our network
is trained to reconstruct a 3D shape by a fixed-order assembly of quadric
primitives, with both branches producing a union of primitive intersections or
inverses. A key difference between DCSG and all prior neural CSG models is
its dedicated residual branch to assemble the potentially complex shape
complement, which is subtracted from an overall shape modeled by the cover
branch. With the shape complements, our network is provably general, while the
weight dropout further improves compactness of the CSG tree by removing
redundant primitives. We demonstrate both quantitatively and qualitatively that
DCSG produces compact CSG reconstructions with superior quality and more
natural primitives than all existing alternatives, especially over complex and
high-genus CAD shapes.Comment: 9 page
SKED: Sketch-guided Text-based 3D Editing
Text-to-image diffusion models are gradually introduced into computer
graphics, recently enabling the development of Text-to-3D pipelines in an open
domain. However, for interactive editing purposes, local manipulations of
content through a simplistic textual interface can be arduous. Incorporating
user guided sketches with Text-to-image pipelines offers users more intuitive
control. Still, as state-of-the-art Text-to-3D pipelines rely on optimizing
Neural Radiance Fields (NeRF) through gradients from arbitrary rendering views,
conditioning on sketches is not straightforward. In this paper, we present
SKED, a technique for editing 3D shapes represented by NeRFs. Our technique
utilizes as few as two guiding sketches from different views to alter an
existing neural field. The edited region respects the prompt semantics through
a pre-trained diffusion model. To ensure the generated output adheres to the
provided sketches, we propose novel loss functions to generate the desired
edits while preserving the density and radiance of the base instance. We
demonstrate the effectiveness of our proposed method through several
qualitative and quantitative experiments
SLiMe: Segment Like Me
Significant strides have been made using large vision-language models, like
Stable Diffusion (SD), for a variety of downstream tasks, including image
editing, image correspondence, and 3D shape generation. Inspired by these
advancements, we explore leveraging these extensive vision-language models for
segmenting images at any desired granularity using as few as one annotated
sample by proposing SLiMe. SLiMe frames this problem as an optimization task.
Specifically, given a single training image and its segmentation mask, we first
extract attention maps, including our novel "weighted accumulated
self-attention map" from the SD prior. Then, using the extracted attention
maps, the text embeddings of Stable Diffusion are optimized such that, each of
them, learn about a single segmented region from the training image. These
learned embeddings then highlight the segmented region in the attention maps,
which in turn can then be used to derive the segmentation map. This enables
SLiMe to segment any real-world image during inference with the granularity of
the segmented region in the training image, using just one example. Moreover,
leveraging additional training data when available, i.e. few-shot, improves the
performance of SLiMe. We carried out a knowledge-rich set of experiments
examining various design factors and showed that SLiMe outperforms other
existing one-shot and few-shot segmentation methods
MaskTune: Mitigating Spurious Correlations by Forcing to Explore
A fundamental challenge of over-parameterized deep learning models is
learning meaningful data representations that yield good performance on a
downstream task without over-fitting spurious input features. This work
proposes MaskTune, a masking strategy that prevents over-reliance on spurious
(or a limited number of) features. MaskTune forces the trained model to explore
new features during a single epoch finetuning by masking previously discovered
features. MaskTune, unlike earlier approaches for mitigating shortcut learning,
does not require any supervision, such as annotating spurious features or
labels for subgroup samples in a dataset. Our empirical results on biased
MNIST, CelebA, Waterbirds, and ImagenNet-9L datasets show that MaskTune is
effective on tasks that often suffer from the existence of spurious
correlations. Finally, we show that MaskTune outperforms or achieves similar
performance to the competing methods when applied to the selective
classification (classification with rejection option) task. Code for MaskTune
is available at https://github.com/aliasgharkhani/Masktune.Comment: Accepted to NeurIPS 202
- …